On Approximability of Block Sorting 



N. S. Narayanaswamy^ and Swapnoneel Roy^* 

^ Department of Computer Science and Engineering, 
Indian Institute of Technology Madras, Chennai, TN 600036, India. 
^ Department of Computer Science and Engineering, 
University at Buffalo, The State University of New York, Buffalo, NY 14260, USA. 



Abstract. Block Sorting is a well studied problem, motivated by its 
applications in Optical Character Recognition (OCR), and Computa- 
tional Biology. Block Sorting has been shown to be NP-Hard, and 
two separate polynomial time 2-approximation algorithms have been de- 
signed for the problem. But questions like whether a better approxima- 
tion algorithm can be designed, and whether the problem is APX-Hard 
have been open for quite a while now. 

In this work we answer the latter question by proving Block Sorting to 
be Max-SAAP-Hard (APX-Hard). The APX-Hardness result is based on 
a linear reduction of MAX-3SAT to Block Sorting. We also provide 
a new lower bound for the problem via a new parametrized problem 
A:-Block Merging. 

1 Introduction 

The Block Sorting problem is a combinatorial optimization problem to find 
out the minimum number of block moves required to sort a given permutation 
TT. A block is a maximal substring of tt, which is also a substring of the sorted 
(identity) permutation id. Block Sorting is motivated by its applications in 
optical character recognition In optical character recognition, text regions 

referred to as zones are identified. The ordering of the zones is very important. 
But in practice, the output generated by any zoning algorithm is frequently dif- 
ferent from the correct order. To measure how good a zoning algorithm is, we 
need to find the minimum number of steps required to transform the string gen- 
erated by the zoning algorithm, to the correct string. The problem of obtaining 
the number of steps to convert the given string into the correct string is equiva- 
lent to Block Sorting. Hence it is very important to design efhcient algorithms 
for Block Sorting, and know more about its computational complexity. 
Block Sorting is also gains much importance from the fact that it is a non- 
trivial variation of a very weU known problem Sorting by Transpositions 
which is motivated by the study of genome rearrangements in computational bi- 
ology. In transpositions, we are allowed to move any substring of tt to a different 
position at each step [5l. Sorting by Transpositions optimizes the number 
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of such moves to sort tt. It is easy to see that a block move is a transposition, but 
not the vice versa. Sorting by Transpositions has been recently shown to be 
NP-Hard ^7]. The best known algorithm for Sorting by Transpositions has 
an approximation ratio of 1.375 6J. It is not known yet whether Block Sorting 
approximates Sorting by Transpositions to any factor better than 3. But it 
is known that optimal transpositions never need to break existing blocks [8 . This 
shows how the two problems are closely related. The study of the computational 
complexity of Block Sorting therefore might provide us with more insight into 
the complexity of Sorting by Transpositions. It is still not known whether 
Sorting by Transpositions is APX-Harcj^ or it admits a PTAS^. 
Block Sorting is also closely related to another problem called Sorting by 
Short Block-Moves. In a short block move, we are allowed to move an element 
of TT to at most two positions away from its original position. Sorting by Short 
Block-Moves optimizes the number of such moves required to sort tt [9|. The 
problem is motivated by its applications in the study of genome rearrangements 
and in the design of interconnection networks. It is easily observed that a short- 
block move is also a block move, but not the vice-versa. The complexity of 
Sorting by Short Block-Moves is stiU open. It has been studied extensively, 
and recently a PTAS has been designed for the problem [IQJ . We believe that our 
results on the complexity of Block Sorting could help resolve the computation 
complexity of Sorting by Short Block-Moves, which has been open for close 
to a decade and a half now. 



2 Overview of the results and techniques 

The set {1, 2, • • • , n} is denoted by [n], and let Sn denote the set of all permuta- 
tions over [n], and idn the sorted or identity permutation of length n. The given 
permutation tt e S'„ to be sorted is represented as a string 7ri7r2 ■ ■ ■ 7r„ without 
loss of generality. 

Definition 1 (Block). A block is a maximal substring of a given permutation 
TT, which is also a substring of the identity permutation id. 



As an example, the permutation 825639147 contains 8 blocks, and 5 6 
is the only block of length more than one. A block move picks up a block and 
places it elsewhere in the permutation. A block sorting schedule is a sequence 
of block moves to sort a given permutation tt. The minimum number of such 
moves required is called the block sorting distance bs^n) of permutation tt. An 
example of the block move of moving block 1 to block 2 for 
7r=314625798 10 is shown in Figurc[l]in Appendix [B| 
Any block in tt could be replaced by a single clement without loss of generality. 
Hence the permutation 72538146 is equivalent to 82563914 7. We can 
do this because we do not break blocks once they get joined to form larger blocks 



Defined in Appendix [a| 
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in a block move. The reduced permutation is termed as reduced permutation [2] 
or a kernel permutation [3] . 
Block Sorting can be stated as: 



Block Sorting Problem 

Input: A permutation tt and an integer m. 

Question: Is 6s(7r) < to? 



In [3], it has been formally proved that block-sorting tt is equivalent to block- 
sorting its kernel fcer(7r). That is 5s (tt) = bs{ker{TT)). Also it was shown in 
that in an optimal block-sorting sequence, we never need to break apart an 
existing block at any step. That is the block-sorting distance remains the same, 
even if we allow block-sorting moves which do not necessarily join blocks, or 
which breaks any previously joined blocks. 

Block Sorting was proved to be NP-Hard via a reduction from 3SAT in [2]. 
We reduce MAX-3SAT to Block Sorting via a hnear reduction and prove 
it to be Max-iSA/'T'-Hard. We achieve this by proving a new technical lemma 
(Lemma [7| for block sorting. We prove that the number of moves in any block 
sorting schedule for any permutation is at least the sum of the number of reversals 
in the permutation, and the number of disconnected components in the red-blue 
graph constructed from that schedule. Our results show Block Sorting does 
not admit a PTAS unless P = NP. 

Definition 2 (Reversal). In a permutation tt, a reversal is a pair of consecu- 
tive elements ab such that a> b. Formally a and b form a reversal in n if a > b 
and TTf, = TTa -f 1. 

Let the number of reversals in tt be rev{Tr). In [2], it has been shown that a block 
sorting sequence of length rev{Tr) is optimal, since the block sorting distance 
6s(7r) > rev{Tr). 

In [2] the authors had constructed permutation tt from an arbitrary 3SAT 
boolean formula 'P, such that 

• ^ is satisfiable if and only if 6s(7r) = rev{7T). 

In this work we construct permutation vr from an arbitrary Max-SSAT instance, 
a boolean formula with to clauses, such that 

• If all the TO clauses of <P are satisfiable, then 6s(7r) — rev{TT). 

• If at most m — c clauses of 'P are satisfiable, then &s(7r) > ref (tt) -|- c. 

The above proves Block Sorting to be Max-5A/'7'-Hard, which is one of our 
results. 

The question whether Block Sorting is APX-Hard was open for quite some- 
time now, and we answer it in this work. We believe, this result will also provide 
us more insight on the complexity of the more general Sorting by Transpo- 
sitions, which has been recently proved to be NP-Hard. Also we might use a 
similar reduction technique to prove the hardness of Sorting by Short-Block 
Moves, whose complexity remains unresolved till date. 
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The problem Block Merging has been introduced in j3J in conjunction to 
obtaining a factor 2 approximation algorithm for Block Sorting. The permu- 
tation TT can be uniquely decomposed into maximal increasing subsequences. The 
input to Block Merging is the set of these increasing subsequences S^. If tt = 
82563914 7, then §^ = {(8), (2, 5, 6), (3, 9), (1, 4, 7)} is an input instance for 
Block Merging. The goal of Block Merging is to transform S^r to the mul- 
tiset M„ ~ {idn, £, • • • ,£} using the minimum number of block moves. A block 
move on a block is permitted, if and only if the block is contained in at most 
one increasing subsequence. At the beginning, every block in is contained in 
exactly one increasing subsequence. But during the execution of a block-merging 
schedule, a block might get fragmented over several increasing subsequences. Let 
6m(S7r), called the block-merging distance of tt, be the minimum such moves to 
transform S^r to M„. 

In [3] the authors proved a new lower bound for Block Sorting via Block 
Merging as 

• bs[n) > — 

They further proved that Block Merging G P to design a 2-approximation 
algorithm for Block Sorting via Block Merging. 

We parametrize Block Merging to formulate another problem fc-BLOCK Merg- 
ing, for any integer k. In /c-Block Merging, we have the same input and goal 
as Block Merging. But we allow a block to be moved if it is contained in at 
most k increasing subsequences. Hence Block Merging is fc-BLOCK Merging 
with k = 1. Let fc-5m(S7r) be the fc-block merging distance for tt. 
We prove a new lower bound for Block Sorting via fc-BLOCK Merging as 

• bsin) > j^^ i ■ 

In other words, fc-BLOCK MERGING approximates Block Sorting by a factor 
of 1 + ^. We know fc-BLOCK Merging e P for fc = 1. But we do not know 
anything about its complexity for fc > 1. If we can prove it to be polynomial for 
fc = 2, we have a 1.5-approximation algorithm for Block Sorting. To recall, 
the best known approximation algorithms for Block Sorting have a factor of 
2. On the other hand if we prove it to be NP-Hard for fc = 2, then we actually 
prove Block Sorting to be in-approximable to within a factor of 1.5, which 
improves the integrality gap we achieve in our reduction. 

3 The Red-Blue Graph for Block Sorting 

Given tt, and a block sorting sequence S for tt, we construct the red-blue graph 
G{tt,S), where the vertices are the blocks of tt in the following way: 

1. A blue edge is constructed between the participating blocks a and b of each 
reversal (a, b). 

2. A red edge is constructed between two blocks a and b if: 

— a < b and tTq < TTf,, 

— a and b are joined in the sequence S before either is moved, and 

— if TTa < TTc < TTb , then block c is moved before a and b are joined. 
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The intuition behind the red edges is to treat the two blocks participating as 
already in their correct positions [2]. We need to move only the other blocks in 
the block sorting schedule. Thus we effectively save one move per red edge. Hence 
the construction of G{tt,S) is dependent on the given block sorting schedule S 
on IT. Now we present a few properties about the red edges proved in [2,. We 
will state them here without proof. 7r(z) is the position of block i in tt. 

Lemma 1. For any tt, G{tt) is acyclic over both red and blue edges. 

Lemma 2. 0/ Any node x can have a red degree of at most 2. One from y to 
X where y < x another from x to z where x < z. 

Lemma 3. |^ If 7r{a) < 7r(c) < 7r(6) < 7r(d), there cannot be both a red edge 
from a to b, and a red edge from c to d. 

Lemma 4. ^2] If a < c < b < d, there cannot be both a red edge from a to b, 
and a red edge from c to d. 

Lemma 5. IBj Ifa<c<d<b and tt{c) < 7r(a) < 7r(&) < Tr{d), there cannot 
be both a red edge from a to b, and a red edge from c to d. 

A pair of red edges (or pairs of elements (a, b) and (c, d) in tt) which violate any 
of Lemmas [2l [3l D or [sl cross each other. 



Definition 3 (Perfect block sorting). ^2] A perfect block sorting schedule 
^perfect on TT is a block sorting schedule which sorts tt in revlir) moves. Note 
that reviir) is equal to the number of blue edges in graph G^n^Sperfect)- 

Perfect block sorting is optimal since bs^ir) > rev{T:). 

Lemma 6. There exists a perfect block sorting schedule Sperfect on tt if and 
only if G{'K, Sperfect) is a tree. 

Given tt, the blue edges of tt are always fixed, in G{'k,S), for any block sorting 
schedule S. Hence when we refer to blue edges, we would talk about only tt 
instead of G'(7r, S). The red edges will vary for different block sorting schedules. 

Definition 4 (Blue component). In permutationn , all the blue edges connect 
elements which are adjacent to each other. We define the components connected 
by zero or more blue edges as blue components of a red-blue graph. All the blue 
components are .substrings ofn. 

As an example, for tt = 8 2 5 6 3 9 1 4 7, the blue components are {8 2}, 



{ 5 6 3}, {9 1}, {4}, and {7}. The number of blue components in tt is equal 
to the difference between the number of blocks in n and rei;(7r). Formally, 
^blue-components{Tr) = ^blocks{TT) — rev{Tr). In a red-blue graph of a perfect 
block sorting schedule, all these blue components are connected, since the graph 
is a tree. Hence there can be at most ^blocks{Tr) — rev{Tr) — 1 red edges in any 
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red-blue graph of it. Intuitively, we see that the more are the number of discon- 
nected blue components in the red-blue graph G(7r, S), the more are the number 
of moves in S. Each red edge saves one move. Let the number of disconnected 
components in any graph be defined as ^disconnected-components{G(Tr , S)) = 
^connected-components{G{7T,S)) — I. d disconnected components signifies an 
absence of d red edges to connect them to the rest of the graph. Therefore d 
disconnected component signifies at least d more moves for S than Sperfect- We 
formally prove this in Lemma [7] The length of S is the number of moves in S. 

Lemma 7. Length of any block sorting schedule S > rev{TT)+^disconnected-components{G{TT, S)) 
, for any block sorting schedule S on tt. 

The proof of Lemma [7] is in Appendix |Ej Theorem [T] follows from Lemma [7] 

Theorem 1. For any permutationn , if^disconnected-components{G{TT,S)) > 
d for every possible block sorting schedule S on tt, then bs{TT) > rew(7r) -|- d for 
that permutation tt. 

Theorem [l] says that for any tt if we show that the number of disconnected blue 
components in the red-blue graph of any block sorting schedule is at least d, 
then the lower bound for block sorting for tt is at least d more than rew(7r). We 
are now in a position to prove the APX-hardness of Block Sorting. 

4 Block Sorting is Max-5A/'P-Hard 

We use the construction from [2] to reduce MAX-3SAT to Block Sorting. Con- 
sider an instance of MAX-3SAT consisting of a boolean formula <P = C^C^ ■ ■ ■ C" 
of n variables and m clauses C^,C^, ■ ■ ■ ,C"^. A permutation tt of 8m -I- 4n -|- 1 
elements was constructed from ^ by introducing an ordered alphabet Sn,m with 
4nm+2m+An+l elements in [2]. A block sorting schedule S of length 6m+2n—l 
has been shown to exist for tt if and only if was satisfiable. 
Here we use that construction to show the following: 

1. Max-3SAT{^) = m =^ 6s(7r) =6m + 2n-l. 

2. Max-3S'AT(^) < m - c =^ 6s(7r) >6m + 2n-l + c. 

The alphabet Sn,m consists of the following elements: 

1. Term symbols: p^ , p^ , , , V 1 < i < n, and 1 < j < m. pj,pl and qf,qf 
are called left and right term symbols respectively. 

2. Clause control symbols: P and r^ V 1 < j < m + n. 

3. Variable control symbols: Ui and Vi V 1 < i < n. 

4. Separator symbol: s. 

The ordering on Un,m is generated by the following rules: 

1. Ui < p'l < pj < p^ < pi < q{ < q'^ < qj < qf < Vi < s \f 1 < i < n, and 
1 < j < k < m. 
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2. Vi-i < Ui y 1 < i < n. 

3. s<i''<r''<ff<r^yi<j<k<m + n. 

Let the names of each variable he Xi y I < i < n. Each clause is assumed to be 
of the form {za V Zf, V Zc), a > b > c without loss of generality, where Zi is either 
Xi or Xi. The simple encoding of a clause uses symbols pi and qi and pi and, qi 
respectively for literal Xi and, Xi. It consists of eight symbols starting with an 
^, and ending with r. The remaining symbols are the terms symbols, that start 
with p's and end with the g's. As an example the simple encoding of the clause 
(a;5 V X3 V X2) is £p5P3P2<l5g3Q2'r' . For the real encoding of a clause, the index of 
that clause is inserted as its superscript. If = {x^ V ^3 V X2), then its encoding 
would be ■^''pIpIpI?!'?!?!''''- The clause encodings are in the order followed 
by tiU C™ in tt. 

After the real encoding of each clauses, the control sequences I'^'^^mvir"^'^'^ are 
added V 1 < i < n. Finally the element s is added as the first element. Hence 
the derived permutation tt contains s, followed by the encodings of the clauses 
in order, followed by the control sequences V i from 1 to n. An example of the 
reduction of ^ = (x^V X2y xi) A^x^V x^V X2) is shown in Figure|3]of Appendix [C| 
In this figure, we have two clause components in tt, for the two clauses in (p. 

Lemma 8. G{n,S) has 6m + 2n — l blue edges for any block sorting schedule 
S on TT, and hence bsi^n) > 6m + 2n — 1. 

Lemmajs] follows from the fact that rev{7r) = 6m+2n— 1. We state and prove lem- 
mas in this section in a top-down manner. Lemma 10 Lemma [TTj and Lemma 12 



are stated and proved first. Next we state and prove the lemmas which lead to 
Lemma [lOj Lemma [TT] The omitted proofs appear in Appendix [F] 
The blue components of any G{tt,S) are: 

1. The blue component with the s symbol. There is 1 such component. 

2. The blue components with the pf (or pl) symbols, for each 1 < j ^ m, 
1 < i < n. There are m such components. 

3. The blue components with the qf (or qf) symbols, for each I < j < m, 
1 < i < n. There are m such components. 

4. The blue components with the symbols, for each 1 < i < n. There are n 
such components. 

5. The blue components with the symbols, for each 1 < i < 71. There are n 
such components. 

6. The blue component with the r™+" symbol. There is 1 such component. 

Hence there are in total 2m + 2rt + 2 blue components of any G{tt, S) for tt. The 
red-blue graph of a perfect schedule on tt is a tree. Since G{7t,S) is acyclic [5], 
there can be at most 2m -I- 2n -I- 1 red edges to connect these 2m + 2n + 2 blue 
components. 

The three qf symbols 1 < j < n, for the components of each clause for 
1 < j < m, are joined by two blue edges, and form a blue component in any 
red blue graph G{tt, S). We call these blue components containing q symbols for 
each clause component. Therefore out of the 2m + 2n + 2 blue components, m 
are blue components containing q symbols. 
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Lemma 9. There is a one-on-one correspondence between all possible assign- 
ments in <P to all possible arrangements of the set of red edges {{pl,qi) and 
<i<n,l<j<m}. 

Lemma 10. Given a satisfying assignment of^P, if a clause £ ^ is unsatis- 
fied, we cannot have any red edges between any ipl,qf) (or [pl^qf)), 1 < i < n 
for the component of clause in any red blue graph G(7r,iS). 

Lemma 11. For any red blue graph G{tt,S), the blue component containing q 
symbols of any clause can be connected to the graph G{tt, S) only via a red edge 
of type (p^, qj), (or (pj , qj) ) for 1 < j < m, and I < i < n, without disconnecting 
another blue component from G(7r,5). 

Lemma 12. Max-3SAT{^) <m~c &s(7r) > 6m + 2n - 1 + c. 



Proof. Given a satisfying assignment of <P, if is unsatisfied in ^, by Lemma 10 
the blue component with q symbols of C-' will be disconnected in any G'(7r,iS). 
Therefore when Ma,x-3SAT{<P) < to — c, for c such unsatisfied clauses, we would 
have c such disconnected blue components with q symbols in any red blue graph 
G{'K,S). If we try to connect any of these disconnected blue components with 
q symbols via any other red edge than type {p^Tqi), (or (j>l,qi)) for 1 < i < n, 
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it will disconnect at least 1 other blue component from G{tt, S) by Lemma 
This proves when Max-3SAT{(P) < m — c, we have at least c disconnected blue 
components for any red blue graph Gij:, S). And therefore by Theorem[lj we will 
have 6s (tt) > 6to + 2n — 1 + c. □ 

Lemma 13. ^2] Max-iSAT{'^) = m =^ bs(n) = 6m + 2n - 1. 

Proof. This was already proved in 2 . The outline is at least one literal in each 
clause in <!> is true. Hence we can find all the m red edges of type {p{, qf) men- 
tioned Corollary [2] □ 

Lemma 14. It is NP-Hard to approximate Block Sorting to within a factor 
of 1.02. 



Proof Taking c = f^, we have from Lemma 12 Max-3S'Ar(^) < 



7m 



6s(7r) > 6m + 2n — 1 + ^. This proves the lemma. □ 
Lemma [M] leads to Theorem [21 

Theorem 2. Block Sorting is Max-SMV-Hard (APX-Hard). 



We now state and prove the following lemmas, which lead to Lemma 10 and 
Lemma II II 

Lemma 15. In tt, the pairs {ui,Vi) VI < i < n, and (P ,r^) VI < j < m + n 
can always be joined to form blocks before they are moved in any block sorting 
schedule S on n. 

We define a set of red edges E = {{ui, Vi) VI < i < n, {P ,r^) VI < j < to + n, 
and {s,P)} for any red blue graph G(7r,iS). 
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Lemma 16. In any red-blue graph G{tt,S), no red edges in the set E cross each 
other . Hence the number of red edges which could be drawn in any G{tt^S) is at 
least m + 2n + 1. 

Corollary 1. There exists a block sorting schedule S for tt of length 7m + 2n— 1 
steps. 

The |E| — m + 2n + 1 red edges connect the respective blue components to each 
other. Therefore m + 2n + 2 blue components of G{tt,S) get connected to each 
other by the set of edges E. The blue components that do not get connected 
by the edges of E are the blue components with the q symbols. Lemma [T7| is a 
direct implication of Claim A, B, and C of the proof of Lemma 11 of [2 . 

Lemma 17. [2^ In any red blue graph G{t:^S) the set E of is the only set of 
non crossing red edges, which can connect all the m + 2n + 2 blue components 
of G(7r, S) to which they belong to. So any edge absent from E in G{n, S) would 
imply at least one disconnected blue component in G{'!t,S). 

Lemma 18. At most one pair {pl,ql) (or {Pi,qi)) from each clause encoding 
1 < i < "1, and each variable 1 < i < n can be joined to form blocks before they 
are moved. Moreover, if {p^ , ) are joined before they are moved for any clause 
1 < j < m, then {Pi,qf) cannot be joined before they are moved for any clause 
1 < k < m, and 1 < i < n, and k ^ j in any block sorting schedule S on tt. 

Corollary 2. In any red-blue graph G(tt,S), there can be at most m red edges 
between the pairs {pl,ql) (or (pl^ql)) for I < i < n, and 1 < j < m, one such 
red edge in each clause encoding. Furthermore, if there is a red edge between 
(Pi-iQi) ''^ clause encoding for I < i < n and, 1 < j < m, there cannot be a 
red edge between {Pi,qi) in any clause encoding for 1 < i < n and, 1 < k < m, 

5 A New Lower Bound for Block Sorting 

The problem Block Merging has been introduced in [3 . It is defined as follows: 
Block Merging Problem 

Input: A multiset S = {5*1, 52, • • • , Si} of disjoint increasing sequences 

whose union is [n], an integer m. 

Output: The multiset M„ = {ic?„,e, • • • ,e}. 

Constraint: A block is allowed to be moved if it is contained in 

at most one increasing sequence Si. 

Question: Is bm{E) < ml 

The block merging distance bm{S) is the minimum number of block moves to 
transform § to M„. A block move is defined as: Pick a block from any sequence 
Si, and insert it into some other sequence Sj so that it merges with a block 
there. A block is allowed to be moved if it is contained in at most one increasing 
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sequence Si. Any permutation tt can easily be decomposed into the multiset 
of maximal increasing subsequences Stt, which could be an input to Block 



Merging . Lemma 19 and 20 have been proved in [3]. 

Lemma 19. Block Merging e P. 

Lemma 20. For any n, bm,{STr) > fes(7r) > ^"'^^'■•) ^ 

Lemma [20] gives a new lower bound for Block Sorting. It also says that 
Block Merging approximates Block Sorting by a Factor of 2. Lemma [T9| 



along with Lemma 20 gives polynomial factor 2 approximation algorithm for 
Block Sorting. All the omitted proofs of this section appear in Appendix [G} 
We relax the constraint for Block Merging and define the problem /c-Block 
Merging. Specifically in fc-BLOCK Merging, we are allowed to move a block 
which is contained in at most k increasing sequence. We are done when we have 
at most k sequences whose concatenation gives the identity permutation ic?„. 
Formally, we define fc-BLOCK Merging as: 



/c-Block Merging Problem 

Input: A multiset S — {5*1, 52, • • • , Si} of disjoint increasing sequences 
whose union is [n], integers m, and k < I. 
Output: The multiset M„ = {S'J , 5^, • • • , 5^., £,•••,£} 
such that S[S!2---S'f, ^ id„. 

Constraint: A block is allowed to be moved if it is contained in 
at most k increasing sequence Si. 
Question: Is k-bmiS) < m? 

k-bm{E>) is the number of fc-block merging moves to transform § to M„. 
Lemma 21. 6s(7r) < k-bm{E>-^) < 6to(S^). 
Lemma 22. > ^Ilif^. 

Proof. For any permutation tt, given a block sorting schedule &i, 62, ■ ' ' ,bm oi m 
moves we need to prove that we can have a fc-block merging schedule of at most 
m(l + j;) moves. For any block sorting move bi, if bi moves the block i? G tt, 
we move B in the corresponding fc-block merging move if it is contained in at 
most fc increasing sequences in S^^. Else, if B is contained in x > fc increasing 
sequences, we perform [^^J moves to get B in at most fc increasing sequences. 
Next we move B. This way, each move in the block sorting sequence, can be 
performed by one or more moves in the fc-block merging sequence. 
We define two sets Inc{n) — {ni\Tri < 7r,i4_i,Vl < i < n}, and /nc(§) = {i G 
[n— l]\i is not the last element of any subsequence S e S}. In other words, Inc{7r) 
is the set of elements of tt which are lesser than their immediate successor element 
in TT. We have Inc{idn) = /ric(M„) — [n—l]. At the beginning, for any tt, we have 
/nc(7r) = Inc{'B-^). But as we execute a block sorting, and its corresponding fc- 
block merging schedule, /nc(S5r) Q Inc{-K^) at any step i. The reason is, a block B 
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can be fragmented at step i, and that would make Inc(S-^) not contain elements 
which are in Inc{'K). Hence, the defragmentation steps that we perform for k- 
block merging, actually decreases the difference between /nc(S^) and /nc(7r') 
for step i. 

Let Ci be the actual cost of the corresponding fc-block merging moves performed 
for a single block sorting move at step i. Then q = 1 + [^^J ■ We perform an 
amortized cost analysis of the amortized cost for each step, such that > V 
i. Then we bound by 1 + ^ V i. For step i, let Pi = /nc(7r*) and = Inc{S\). 
Further, let |Pi| = Pi, and |Qi| = q^. We define potential function — . 
Then the amortized cost for step i becomes ai — Ci -\- (f>i — 4>i-i- 
To complete the proof, we need to show that (t)i — (pi-i < j: — We have 
^, - = (p--g-)-fe-i-g'-i) ^ ip.~p,-i)~ii,~'i.-i) _ We calculate the change in 
Pi —pi-i, and Qi — Qi-i for all the fc-block merging moves of step i. Let the block 
B be moved to its predecessor block A by the block move bi (the other case is 
analogous) . 

We first find the bound on the value of pi —pi-i. Let the first and last elements 
of the block B he b and c respectively, and the last element of block A be e, that 
is e = 6 — 1. It is clear that e G Pi, since e < b. Further, c E Pi e E P^-i- 
Now we observe that Pi ~ Pi-i = 1 if 

1. either a ^ Pi-i and e ^ Pi-i, 

2. or if exactly one among a and c e Pi-i, and a G P^. 
In every other case pi — pi-i — 0. 

Consider the fc-block merging moves on to simulate block move &j. Recall that the 
block B is fragmented within x increasing sequences. We need to make at most 
moves to bring B into at most fc increasing subsequences. Then we move 
B to its predecessor A, as done by block move bi. Since fc > 1, the maximum 
number of such defragmentation moves performed here is a; — 1. These x — 1 
moves can contribute at most a; — 1 to — qi-i. In fact they contribute a; — 1 if 
c ^ Qj-i, else they contribute x — 2. Again the block move to move B to A adds 
either of e or c to Qi and hence contributes 1 to — Qi-i- But we have a ^ Q^. 
Hence the overall contribution by the block move is 1 if a ^ Qi-i, else its 0. 
To sum it up, block move bi increases pi ~ pi^i by or 1, and the equivalent 
fc-block merging moves increase qi — qi-i by either at most x — 1 or at most 
x~2. In the latter case, we have both a and c € Qi-i, and hence a and c G Pi-i 
which means Pi — Pi-i = 0. 

Therefore we have <t>, - </.,_i = < i^ipil = i - 3^. □ 

Lemma [21] and [22] lead to Theorem [3] Theorem [2] and [3] lead to Corollary [3] 

Theorem 3. fc-BLOCK Merging approximates Block Sorting by a factor of 
1 + i 

Corollary 3. fc-BLOCK Merging is NP-Hard for k > 48. 

Lemma[22] gives us a new lower bound for Block Sorting via fc-BLOCK Merg- 
ing. In [3, given any tt, and S^, a directed graph G = {V,E) has been con- 
structed such that V = [n], and {u,v) G E if u and v belong to the same 
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increasing subsequence in tt. Two edges and (fc, /) cross each other if 

i<k<j<l^OTk<i<l<j.A set £" C iJ is called a non-crossing set 
if no two edges of i?' cross. The size of a largest non-crossing in S^r is denoted 
by c(§7r). Lemma 23 has been proved in [3]. 

Lemma 23. bm{E}-^) = n — 1 — 0(87^)- 

Lemma 24. k-bm{S^) > 

Corollary 4. k-bm{S^) > 

Corollary [4] tells us that the polynomial time algorithm of 3J for Block Merg- 
ing is actually a fc-approximation algorithm for fc-BLOCK Merging. We know 
fc-BLOCK Merging to be polynomial time solvable for fc = 1 from [3 , and our 
results prove it to be NP-Hard for k > 48. But we do not know whether it is poly- 
nomial time solvable for fc = 2. If it is, then we would have a 1.5-approximation 
algorithm for Block Sorting. But if it is not, then Block Sorting would be 
inapproximable to within a factor of 1.5. It is still open whether we can design 
an algorithm with an approximation ratio better than 2 for Block Sorting. 
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A A Few Definitions 



A.l APX 

A problem is said to belong to class APX if we can design a constant factor 
approximation algorithm for it. By definition APX C NP. 



A. 2 Polynomial Time Approximation Scheme (PTAS) 

A PTAS is an algorithm which takes an instance of an optimization problem and 
a parameter e > and, in polynomial time, produces a solution that is within a 
factor 1 + £ of being optimal (or 1 — £ for maximization problems) . For example, 
for the Euclidean traveling salesman problem, a PTAS would produce a tour 
with length at most (1 + e)L, with L being the length of the shortest tour. 



A. 3 APX-hardness 

A problem is said to be APX-Hard, if we cannot design any PTAS for it unless 



B An Example of a Block Move and a Block Sorting 
Schedule 



P = 



NP. 




TT 



31 4 62 5798 10 



tt': 34 612 5798 10 



Fig. 1. An example of a block move 



A block sorting schedule is shown on permutation 825639147 in Figure [2j 
The block moves are indicated at each step. 




Fig. 2. An example of a block sorting schedule 



C An Example of the Reduction Procedure 



A red-blue graph for the tt in Figure |3] has been drawn taking a;2 = 1, and 
X4 = 1. Since <P gets satisfied by this assignment, this red-blue graph is a tree. 
This signifies that the corresponding block sorting schedule is perfect. 



S t p\ P2 P\ Qs 12 ll rl £2 pj pl p2 q2 ^2 ^2 ^2 ^3 ^3 gA ^2 y2 ^4 fo ^3 ^3 ^5 ^6 „4 ^4 ^6 



Fig. 3. An example of the reduction for <P = (2:3 V X2 V a;i) A (2:4 V ^3 V 0:2) 
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D An Example of an Unsatisfied Clause in ^ 



12 3 




Fig. 4. An example of the encoding of an unsatisfied clause. 



E Omitted Proofs from Section [3] 

E. l Proof of Lemma [7] 
From [2], we know the following: 

1. For any block sorting schedule S on any tt, if m is the number of moves in 5, 
^blocks{'K) is the number of blocks in tt, and =fl=red{G{TT,S)) is the number 
of red edges of G{tt,S), then m = ^hlocks{Ti) — 1 — ^red{G{Tr,S)). 

2. G(7r, S) for any tt and any 5 on tt is acyclic. 

The second property implies, that if there are b blue components in any G(7r, S), 
then the number of red edges #red(G(7r, 5)) is at most 6—1. Therefore we have 
m > ^blocks{TT) — b. If we have d disconnected blue components in G(7r, S), then 
we have at most b—l — d red edges in G(7r, S). By the first property in this case 
we have m > #blocks(Tr) — b + d, and ^blocks{Tr) ~b — revij:). This proves the 
lemma. □ 

F Omitted Proofs from Section S] 

F . 1 Proof of Corollary [l] 

This follows from the fact that we can always have the edges in E in any G{tt, S). 
Therefore, we can always construct a block sorting schedule of length 7m + 2n — 1 
in the following way: 

1. Move all the term symbols to their proper places in any order. This would 
take 6m moves (one move for each symbol). 

2. Now move the pairs UiVi to their proper places. Note that they have already 
formed blocks by the above step. This takes n steps. 

3. All the t^r^ pairs have formed blocks, but are in the reverse order after the 
above two steps. So getting them in order requires m + n—1 block moves. □ 
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F.2 Proof of Lemma [9] 

This is already implied from the construction of permutation tt, from formula 

Specifically, given a satisfying assignment for it satisfies k clauses say 
to C*^, if and only if we have a red blue graph G{tt,S) for which we have a set 
of k non-crossing edges {{pl^Qi) or 1 < i < n,l < j < k such that we 

have exactly 1 edge from the component of each clause in Gijr, S) }. To prove 
what we have just stated, we observe when clauses to C'' are satisfied, we 
have at least one true literal for each clause. We pick one true literal (or x^) 
for each clause C-', 1 < j < k. The k red edges {pl,qf) or {pl,qf) corresponding 
to xj (or xj) for each clause do not cross. For the other direction, if we have 
k non-crossing red edges in any G{tt,S), one from each clause component, then 
we can set the corresponding literal to be true for that clause and satisfy those k 
clauses. This property of tt was used along with other properties in to prove 
the NP-hardness of Block Sorting. □ 



F.3 Proof of Lemma [10] 

Since is unsatisfied in for all variables Zi = (xiOr,Xi) G , \fl < i < n, 
we have Zi, the complement of Zi true in at least another clause C^, and Zi 



satisfies clause C*^. Therefore by Lemma |18[ any pair {pl,qf)yi < i < n, in the 
component of clause will cross with at least one pair {p'^,qf) VI < z < n, in 
the component of clause C'^, for 1 < k < m, and k ^ j. By Corollary [2j any 
red edge (j^^ , ) VI < « < n would cross at least cross another red edge of the 
form {p!^,qf) for 1 < fc < to, and fc ^ j in any red blue graph G(7r,5). Since we 
have all the other red edges (for the components other than that of clause C^) 
in G(7r, S), we cannot have any red edge drawn between any pair (pj, qj) for the 
clause component of . □ 



F.4 Proof of Lemma 111! 

We prove that for any red blue graph G{tt,S), the blue component containing 
q symbols of any clause can be connected to the graph G{tt,S) only via a 
red edge of type {pl,qf), (or {pl,qi)) for 1 < j < to, and 1 < i < n. Any other 
red edge drawn to any of the symbols qf of the blue component, will cross a red 



edge G E of Lemma 16 leading to its removal from G{t:,S). This disconnects 



another blue component from G{n,S) by Lemma 17 We further show that any 
red edges other than type (pl^qf), (or {pl,qj)) drawn to the blue components 
with q symbols for different clauses , I < j < m would cross different red 
edges G E. 

We now exhaustively consider all other red edges to connect the blue component 
with q symbols to G(7r, S). We show that each of these red edges are either non- 
existent, or cross with another red edge G E, hence disconnect another blue 
component from G(7r,5). We will prove things for {pl,qf) and without loss of 
generality will omit (p^, ). 
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1. There cannot be any red edge from s to qj, any t'^ to qj, & to qf, and 
from qf to Ui for I < i < n, 1 < k < j < m. This is true because s > qf, 

> qf, P > qf, and qf > Ui yi < i < n,l < k < j < m. And 7r(s) < 7r(g|), 
7r(r'') < T:{qf), 7r(F) < T:{qf), and 7r(g|) < Tr{ui) VI < i < n, 1 < fc < j < m. 
Hence the pairs (s, g^), {r'',qf), ,qf), and (gf , u^) VI < i < n, 1 < fc < j < 
TO, are not in order. 

2. A red edge cannot be drawn between pairs {pf, qf.) from each clause 
encoding I < j < m, and variables I < i < n, 1 < k < n for i k, 
without disconnecting a blue component. The pairs {pf,ql) for 1 < 
k < i < n and I < j < m are not in order, since pf > qf,, and T^{pf) < T^ilf^) 
for 1 < k < i < n and 1 < j < to. 

For I < i < k < n and I < j < m, Ui < pf < Vi < qf and pf < Uk < qf < Vk- 
Hence the red edges {pl,qf) ioi- 1 < i < k < n and 1 < j < to cross with 
both the red edges (ut, Uj) G E, and {uk, w^) G E by Lemma |4j 

3. A red edge cannot be drawn from any qf to any r'^ for 1 < i < n, 
^ 1^ i 1^ k < m, without disconnecting a blue component. We have 
Ui < qf < Vi < ioi 1 < i < n, 1 < j < k < m. Hence red edges {qf,r^) 
and {ui, Vi) G E cross by Lemma|4]for 1 <i <n, I < j < k < m. Therefore 
drawing red edge {qf, r^) disconnects the blue component with the vt symbol 
from the graph. 

Moreover, the red edge {qf,r'^) for any l<i<n,l<j<k<m, also cross 
with red edge {P ,r^) G E for that j, by Lemma [2] Hence drawing a red edge 
of type {qf,r^) for \ < i < n, 1 < j < k < m, disconnects more than one 
blue component from the graph. 

4. A red edge cannot be drawn from any qf to any for 1 < « < 

1 < j < k < m without disconnecting a blue component. We have 
< qf < Vi < ior 1 < i < n, 1 < j < k < m. Hence red edges {qf,i^) 
and {ui,Vi) G E cross by Lemma |4] for 1 < z < n, 1 < j < fc < to. Therefore 
drawing red edge {qf , disconnects the blue component with the Vi symbol 
from the graph. 

Again we have tt{&) < TT{qf ) < Ti{r^) < tt{£'') for I < i < n, 1 < j < k < m. 
Hence red edges (qf , £'') and {P ,r^) G E for 1 < i < n, 1 < < fc < m, 
cross by Lemmapl Hence drawing a red edge of type {qf,l^) for 1 <i <n, 
1 < j < k < m, disconnects more than one blue component from the graph. 

5. A red edge cannot be drawn from any qf to any Vk for I < i < n, 
1 < j < TO, and 1 < k < m without disconnecting a blue component. 
For k < j, we have qf > Vk, and 7r(q^) < 7r(ufc). Hence the pair {qf,Vk) is 
out of order for k < j. 

For fc > j, if we draw a red edge from qf to Vk, the red edges {qf,Vk) and 
{uk,Vk) G E cannot coexist by Lemmaji] Again we have ^{qf) < 7r(i!™+*^) < 
TT{vk) < 7r(r™+'=). Hence red edges {qf,Vk), {1"'-+^ ^r"'+^) e E cross by 
Lemma [3j Hence again more than one blue components get disconnected in 
this case. The component with the Vk symbol, and the component with the 
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Red edges other than type {p^ , q^ ) drawn to connect the blue component with 
symbol qf for different clauses cross different edges of the set E, as stated 
earlier. □ 



F.5 Proof of Lemma 1151 

We need to show the following for all the pairs mentioned in the lemma: 

1. They are in order: a < b, and 7r(a) < 7r(6), V(a, b). 

2. Any two pairs do not violate any conditions of Lemmas [2] [3) |4j or [5] That 
is, they do not cross each other. 

To recall, the boolean formula ^ has n variables, and m clauses. We have P < 
r-', and t:{&) < 7r(r^) VI < < m + n. Hence the pairs (P ,r^) are in order 
yi < j < m + n. Further u*^ < v'' , and tt{u^) < tt{v'') VI < fc < n. Hence the 
pairs (u'', v'') are in order Vl < k < n. We state and prove the following claim: 

Claim. For any l<i<ni + n,l<j<in + n,l<k<n the pair r*) does 
not cross the pairs (F, r^), and [u^^v^) 

Proof. 1. t < r' <P <r^ yi <j <i<m + n and, 7r(r) < 7r(r') < 7r(£^) < 
TT{r^) yi < i < j < m + n. Hence the pairs r'), and {P , r^) do not cross 
yi < i < m + n, 1 < j < m + n. 

2. <v^ < P < r^,yi < k < n,l < j < m + n, and n{P) < tt{u^) < tt{v'') < 
7r(r-'), VI <k<n,j — m + k. 

3. 7r(F) < 7r(r^) < tt{u'') < 7r(t;''),Vl < j < n,j < m + k, and t!-{u'') < tt{v'') < 
it{P) < 7r(r^),Vl < k < n,j > m + k. 

2. and 3. imply, the pairs (m*^, v''), and {P ,r^) do not cross each other VI < fc < 
n,l < j < m + ji. 

Claim |F.5| completes the proof of Lemma [TSj □ 



F.6 Proof of Lemma 1161 

With a similar argument as in the proof of Lemma we can show that the pair 
(s, P) does not cross with the pair {u\ u*) VI < i < n, or with the pairs {P ,r^) 
yi < i < m + n. Hence we can always draw red edges between the pairs (u*, u*), 
(F, r^), and {s,fi). There are n red edges of type m + n red edges of 

type {P ,r^), and a single red edge of type (s, P). This makes the total number 
of such red edges m + 2n + 1. □ 



F.7 Proof of Lemma [TtI 

In a red blue graph of a perfect block sorting schedule G{TT,Sperfect), all the 
m + 2n + 2 blue components mentioned in Lemma 17 are connected. Claim A, 
B, and C of the proof of Lemma 11 of [2] prove: 
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1. The union of these m + 2n + 2 blue components form a connected subgraph 

of G {tT.S per feet)- 

2. The only way to connect all of these TO+2n+2 blue components m G(7r, Sperfect) 
is to have all the edges of set E. 

The two above implications lead to the fact that E is the only set of edges which 
can connect all of these in m + 2n + 2 blue components in any G{tt, S). Hence any 
edge absent from G{tt,S) will imply at least one disconnected blue component 
inG(7r,5). ' □ 

F. 8 Proof of Lemma [Ts] 

1. We have 7r(p'[) < n{pl) < n{ql,) < n{qj), for 1 < i < n, 1 < j < m. Hence 
by Lemma [3] they cross. 

2. We have pi < Pi < ql < qi ioi 1 < i < n, 1 < j < m, and 1 < fc < m. Hence 
by Lemma [4] they cross. □ 

G Omitted Proofs from Section [5] 

G. l Proof of Lemma 1211 

To prove this, we just observe that for fc = 1 which is the minimum value k can 
have, fc-BLOCK Merging reduces to Block Merging. For the maximum value 
fc can have, which is simply equal to the number of increasing sequences S has, 
fc-BLOCK Merging reduces to Block Sorting. Hence the above inequalities 
holds for any 1 < fc < (# of increasing sequences in S). □ 

G.2 Proof of Lemma [H 

The proof sketch of Lemma [23] is based on the fact that a block is allowed to be 
moved in Block Merging if it is contained in at most one increasing sequence. 
This in fact ensures that at most one edge can be added to c(S7r) by a block 
merging move. Also, a valid block merging move in which at least one block is 
merged adds at least one edge to c(S^). Hence c(§^) can be increased by exactly 
1 by each block merging move. 

Further, it has been shown in [3] that a block merging move to reduce c(S^) by 
1 can be always found in polynomial time. Since, c(M„) — n — 1, this gives a 
polynomial time exact algorithm for block merging. □ 

G.3 Proof of Lemma [21 

Since we allow a block to be moved in fc-BLOCK Merging if it is contained in 
at most fc increasing subsequences, we can add at most fc new edges to c(§^) by 
a fc-block merging move. A block fragmented across fc increasing sequences in 
Stt will add fc edges to c(S,r) when moved into one increasing subsequence by a 
single fc-block merging move. □ 
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H An Application of Block Sorting in OCR 

In figure [5j we illustrate this concept with an example inspired by an application 
in optical character recognition. Here we have a permutation "How ? they did it 
do" recognized, but not in the correct order "How they did do it ?" . We observe 
that it requires 3 block moves to sort the permutation by using Block Sorting. 
The blocks arc moved and combined with other blocks to form larger blocks at 
each step. 



A F C B J£ D 


How ? they did it do 


A C; B £ t" D move F 


How ilifv ilitl 


it ? 


( 


lo 


A C B E D and combine 


How they did 




-E- 


( 


lo 


A B C E D move C 


How did they 




-E- 


do 


A E D and combine 


A 




-E- 




do 


A D E move D 


A 


do 


-E- 




A and combine 


A 





Fig. 5. Block sorting string "How did they do it?" (Reproduced from [5]). 



